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Abstract 

In this paper we review an approach to estimating the causal effect of a time- 
varying treatment on time to some event of interest. This approach is designed 
for the situation where the treatment may have been repeatedly adapted to patient 
characteristics, which themselves may also be time-dependent. In this situation the 
effect of the treatment cannot simply be estimated by conditioning on the patient 
characteristics, as these may themselves be indicators of the treatment effect. This 
so-called time-dependent confounding is typical in observational studies. We discuss 
a new class of failure time models, structural nested failure time models, which can be 
used to estimate the causal effect of a time- varying treatment, and present methods 
for estimating and testing the parameters of these models. 

1 Introduction 

This paper offers a new approach to estimating, from observational data, the causal effect 
of a time- dependent treatment on time to an event of interest in the presence of time- 
dependent confounding variables. This approach is based on a new class of failure time 
models, the structural nested failure time models (SNFTM). The primary goal of this paper 
is to motivate the need for structural nested failure time models. To achieve this goal in the 
most straightforward manner, we shall assume that the event times are observed without 
censoring, and that there is no missing or misclassified data. Additional complications that 
arise when these assumptions are not satisfied are discussed in Robins et al. (1992) and 
Robins (1993). 

The approach using SNTFMs will be useful in any observational study in which there 
exist time-dependent risk factors that are also predictive for subsequent exposure to the 
treatment under study, i.e. in any study where there are time-dependent covariates that 
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correlate with the final outcome of the treatment, but also with the amount or type of 
treatment over time. This situation arises in any observational study in which there is 
"treatment by indication", i.e. the treatment is not predetermined by the investigator, 
but adapted to the current condition of the patient. The problem then is to distinguish 
between treatment effect and selection bias (i.e. confounding). For example, in an obser- 
vational study for the effect of AZT treatment on HIV-infected subjects, subjects with low 
CD4 lymphocyte counts at a given time are subsequently at increased risk of developing 
AIDS and are for that reason more likely to be treated with AZT. Thus the covariate 
variables "low CD4-count" is a risk factor for AIDS, but is also a predictor of subsequent 
treatment with AZT. The problem is then to isolate the effect of AZT treatment as given 
according to a predetermined plan (which may take into account covariates) from the con- 
founding effect of CD4-count. As a second example, many physicians withdraw women 
from exogenous estrogens at the time they develop an elevated blood cholesterol, since 
both exogenous estrogens and elevated blood cholesterol are considered possible cardiac 
risk factors. Therefore, in a study of the effect of postmenopausal estrogen on cardiac 
mortality, the covariate variables "cholesterol level" is a predictor of subsequent exposure 
to estrogens, but also correlates with the outcome "cardiac mortality" . As a third example, 
in observational studies of the efficacy of cervical cancer screening on mortality, women 
who have had operative removal of their cervix due to invasive disease are no longer at 
risk for further screening (i.e. exposure), but are at increased risk for death. Therefore, the 
covariate, "operative removal of the cervix", is an independent risk factor for death, but 
also a predictor of subsequent exposure. As a final epidemiologic example, in occupational 
mortality studies, unhealthy workers who terminate employment early are at increased 
risk of death compared to other workers and receive no further exposure to the chemical 
agent under study. Therefore, the time-dependent covariate "employment status" is an 
independent risk factor for death, and a predictor of exposure to the study agent. 

Epidemiologists refer to the covariates in the preceding examples as "time-dependent 
confounders" . It may be important to analyze the data from any of the above studies using 
the approach presented in this paper. 

For pedagogic purposes, we shall illustrate our models and assumptions throughout the 
paper by the problem of estimating, from data obtained in an observational study, the effect 
of treatment with the drug AZT on time to clinical AIDS in asymptomatic subjects with 
newly diagnosed human immunodeficiency virus (HIV) infection. We shall suppose that 
measurements on current AZT dosage as well as on various time-dependent covariates, such 
as weight, temperature, hematocrit, and CD4-lymphocyte count, are recorded at regularly 
spaced time points, until the development of clinical AIDS. These time points, which we 
denote by = r < T\ < r 2 <•••<•••< Tk, may for instance correspond to clinic visits 
at which the measurements are obtained, with time defined as time since the diagnosis of 
HIV infection. 

Our goal will be to identify and estimate, for each treatment regime, the time-to-AIDS 
distribution that would have been observed if (typically counter to fact) each study subject 
had followed the AZT treatment history prescribed by the regime. We shall call each such 
distribution an AZT treatment regime-specific, counterfactual, time-to-AIDS distribution. 
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The treatment regimes we study need not be static. A treatment regime is a rule that 
assigns to each possible covariate history through time 77., an AZT dosage rate to be 
taken in the interval (rfc, Tfc+i]. A simple example of a treatment regime is "take an AZT 
dosage of 1,000 milligrams of AZT daily in the interval (rk,Tk+i) if the hematocrit 
measured at exceeds 30; otherwise take no AZT in the interval". 

Our interest in AZT treatment regime-specific, counterfactual time-to- AIDS distribu- 
tions is based on the following considerations. Suppose, after the completion of the study, 
a further individual with newly diagnosed HIV infection, whom we shall call "the infected 
subject", wishes to use the data from the completed study to select the AZT dosage sched- 
ule that will maximize his expected or median number of years of AIDS-free survival. 
If the "infected subject" is considered exchangeable with the subjects in the trial, then 
he would wish to follow the AZT treatment regime whose regime-specific, counterfactual 
time-to- AIDS distribution has the largest expected or median value. 

In Section |21 we show that the AZT treatment regime-specific, counterfactual time-to- 
AIDS distributions are identified from the observed data under the assumption that the 
investigator has succeeded in recording sufficient data on the history of all covariates to 
ensure that, at each time r^, given the covariate history and the AZT treatment history 
up till Tfc, the AZT dosage rate in (rfc,r fc+1 ] is independent of the regime-specific, coun- 
terfactual time-to- AIDS. Robins (1992) refers to this assumption as the assumption of no 
unmeasured confounding factors. In other words, under this assumption at each time point 
the treatment can be viewed as depending only on recorded information up till that point 
and external factors that are not predictive of (counterfactual) survival. 

In Section 13] we introduce structural nested failure time models (SNFTM). An SNFTM 
models the magnitude of the causal effect of a (final) blip of AZT treatment in the interval 
(r fc , Tjfc+i] on time-to-AIDS, as a function of past AZT and covariate history. We show that, 
under the assumption of no unmeasured confounding, the null hypothesis of no causal effect 
of AZT on time-to-AIDS is equivalent to the null hypothesis that the parameter vector of 
any SNFTM is 0. 

The term "structural" in SNFTM derives terminology used in the social science and 
econometric literature (e.g. Rubin (1978)). Our models are "structural", because they 
directly model regime-specific, counterfactual time-to-AIDS distributions. In Sections El 
and we discuss two different methods to fit SNFTMs and to use them for inference. 

In Section El we show that, under the assumption of no unmeasured confounding, 
SNFTMs can be understood as a component of a particular reparameterization of the 
joint distribution of the observables. We use this reparameterization to develop likelihood- 
based tests of the causal null hypothesis of no effect of AZT-exposure on time-to-AIDS. 
We also show how to estimate the AZT-treatment regime-specific, counterfactual time- 
to-AIDS distributions, in the case that the null hypothesis of no causal effect of AZT on 
time-to-AIDS is rejected. 

In Section[7|we present an alternative, semiparametric approach to test the null hypoth- 
esis of no treatment effect and to estimate the parameters in an SNFTM. This approach, 
G-estimation, has the advantage of avoiding for parameterization of the distributions ap- 
pearing in the likelihood-based approach of Section El (e.g. the conditional distributions of 
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covariates given past treatment- and covariate history). Instead G-estimation uses a model 
for the SNFTM and for the conditional distribution of treatment given past treatment- and 
covariate history. Tests and estimators based on G-estimation have the additional advan- 
tage that they can often be calculated with standard software. 

2 Formalization of the problem 

We fix a discrete time frame To = < T\ < r 2 < . . . < tk throughout the paper, where To 
is the time of enrollment in the study (and possibly also initiation of treatment), T\,t<i, . . . 
are the times of the clinic visits, and tk can be the time of the last clinic visit, or can be 
chosen past the upper support point of the time-to- AIDS distribution. For simplicity the 
times of the clinic visits are assumed to be the same for all patients (as long as they are 
alive). 

At each time point T k we measure a covariate vector L k for each patient, where Lq may 
also contain time-independent covariates and information collected before time To, and we 
register the treatment given in the interval (T k ,T k +\] in a variable Ak, for instance the 
AZT dosage, assumed constant during the interval. Besides covariates L k and treatments 
A k , we observe for each person a positive time T, for instance the time from enrollment 
to the development of clinical AIDS. Thus the data observed on one person is a vector 
(L K , A k , T), where, for each k — 0, 1, . . . , K, 

Lk = (Lq, Li, . . . , Lk), 
A k = (A ,A 1 ,...,A k ). 

For time instances T k > T the values L k and A k may be interpreted to be empty. For 
simplicity we assume that the variables L k and A k take their values in countable sets, 
denoted by C k and Ak- The total set of observations are a sample of n independent and 
identically distributed (i.i.d.) observations from the distribution of the random vector 
(L k ,Ak,T). 

As is clear from the preceding display we use the overline notation to denote a "cu- 
mulative vector". For simplicity of notation, it will be understood that whenever two 
expressions such as lk and lk-i occur together, then lk-i is the initial part of l k . 

A "treatment regime" is a prescription for the treatment dosages fixed at the times Tk, 
where at each time instant the prescribed treatment may depend on the observed covariate 
history until this time. We make this precise in the following definition. 

Definition 2.1 (treatment regimes). A treatment regime g is a vector g = (go, . . . ,gK) of 
functions g k : Co x • • • x C k — > Ak- 

The value a,k = gkijk) of the kth coordinate of the treatment regime g at covariate lk 
is interpreted as the dosage prescribed by treatment regime g in the interval (r k , r k +i] to a 
patient with covariate history 1 k following this regime (up to time Tk). The treatment at 
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time r k may depend on the full covariate history l k = (l , . . . ,h) until time T k , not just on 
l k . We define maps g k : Co x • • • x £ fe — > Aq x ■ • • x A k by 

<7fc(4) = {go(k) , 9i(k) , ■ ■ -,9k(Jk))- 

To alleviate notation we may drop the subscripts k or the overline in g k or ~g k if the value of 
k is clear from the context. In particular gilx) = 9 {Ik) = 9k(Jk) are equivalent notations 
for the complete treatment history. 

We wish to study the effect of treatment using the observed data. Depending on this 
data not all treatment regimes may be accessible to analysis. We call a treatment regime 
"evaluable" (relative to the distribution of the data vector (Lk, Ak,T)) if whenever the 
regime was followed until some time r k by some positive fraction of the population, then 
it is also followed in the interval (r k ,T k+ i\. 

Definition 2.2 (evaluable treatment regimes). A treatment regime g is called evaluable if 
for each k and each l k £ C k , 

P (L k = l k ,A k ^ = g (I fc _ x ) , T > T fc ) > P (L k = l k , A k = g (l k ) , T > r k ) > 0. 

Next we introduce counter] "actual variables. These will be instrumental both to express 
the aims of the statistical analysis, and to formulate our assumptions. In our mathematical 
model the counterfactual variables are ordinary random variables T 9 , one for each treat- 
ment regime g, that are assumed to be defined on the same probability space as the data 
vector (Lk, Ak,T). The variable T 9 should be thought of as a patient's time to clinical 
AIDS had she been treated according to treatment regime g. Because in actual fact the 
patient receives treatment Ak (resulting in time to aids T), the variable T 9 is "counter to 
fact". However, it gives a useful notation to express the distribution of interest, and will 
be related to the observable variables by two assumptions. 

Counterfactual variables referring to different subjects are assumed independent (cf. 
Rubin (1978)), and hence we can formulate our set-up in terms of the set of random 
variables (T 9 ,T, Lk, Ak) referring to one person. We shall not be interested in the joint 
distribution of counterfactual variables corresponding to different treatment regimes. We 
also do not need counterfactual versions of the covariates or treatments. 

We describe the aims of the statistical analysis in terms of the counterfactual variables. 
The G-null hypothesis of no effect of AZT on time-to- AIDS is the hypothesis that 

P (T 91 > t) — P (T 92 > t) for all treatment regimes g\ and #2- 

In Section El we derive fully parametric likelihood-based tests of this G-null hypothesis 
based on a random sample from the distribution of the observables (Lk,Ak,T^, and a 
parametric model for their joint distribution. In Section [7| we develop an alternative, 
semi-parametric procedure with the same aim. 

If the G-null hypothesis is rejected, then the next goal is to identify and estimate, 
for each treatment regime g, the survival curve t P (T 9 > t), i.e. the survival curve 
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that would have been observed had a subject followed regime g. Specifically, if our in- 
fected subject outside of the study mentioned in the introduction wishes to maximize 
his expected years of AIDS-free survival, he would follow the regime g that maximized 
ET 9 = J °° P (T 9 > t) dt. Inference regarding the distribution of counterfactual variables 
is referred to as causal inference, as the outcomes T 9 are interpreted as being the effect of 
the treatment regime g. 

Clearly it is impossible to make inference about the counterfactual survival distributions 
P(T 9 > t) based on the observed data unless the variables T 9 and (Lk, A%, T) are related. 
The assumed coupling of these variables on a given underlying probability space allows to 
make the following assumptions relating counterfactual and factual variables. 

Assumption 2.3 (consistency). For any treatment regime g, l k G C k and t e (T k ,r k+ i], 

{T 9 > t,L k = J k ,A k = g (I fc ) , T > r k ) = {T > t, L k = J k , A k = g (J k ) ,T > r k ) . 

Assumption 2.4 (no unmeasured confounding). For any treatment regime g, for any 
time r k and for any l k e C k , 

A k ALT 9 \L k = l k ,A k _ 1 = g(J k _ 1 ). 

Here the notation XALY\Z = z, borrowed from Dawid (1979), means that the random 
variabless X and Y are conditionally independent given the event Z = z. 

The consistency assumption, Assumption ESI couples the true and counterfactual sur- 
vival times T and T 9 by merely stating that if until some time r k a patient is treated 
exactly as prescribed by regime g, then she would die at some time in the interval (r fc , r fe+1 ] 
under regime g if and only if she actually died at the same time. This implies in particular 
that if all patients were treated according to a predetermined treatment regime, then coun- 
terfactual and actual survival times coincide. This is the customary situation in clinical 
trials, but may fail to be the case in an observational study. 

The assumption of no unmeasured confounding, Assumption 12.41 can be expected to 
hold if the observed covariate history contains sufficient information, so that at each 
time T k the treatment A k can be assumed to depend on the covariate history L k of a 
patient up till that time and no other relevant information. The assumption would for 
instance hold if at each time r k the treatment in the interval (rfc, T k +i] is assigned through 
randomization within fixed levels of equal covariates L k and earlier treatments. 

More specifically, in our AIDS example Assumption 12.41 may be expected to hold if 
the following information is recorded in L k : all risk factors (i.e. predictors) of regime- 
specific, counterfactual time-to-AIDS, other than prior AZT-history A k -i, that are used 
by physicians and patients to determine the dose A k of AZT in (r k ,r k+ i}. Then, given 
L k and A k ^\ = g(L k _{j, the treatment A k in the interval (r k ,T k+ i] may be thought of 
as depending only on external factors unrelated to the patient's prognosis regarding time- 
to-AIDS, and hence as being independent of T 9 . For example, since it is known that 
physicians tend to prescribe AZT to subjects with low CD4-counts and a low CD4-count is 
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an independent predictor of time-to-AIDS, the assumption of no unmeasured confounding 
would be false if Lk does not contain CD4-count history. 

It is a basic objective of epidemiologists conducting an observational study to collect 
data on a sufficient number of covariates to ensure that Assumption 12.41 will be true. In 
this paper, we assume this objective has been realized, while recognizing that, in practice, 
this may only approximately be the case. 



3 G— computation 

We are interested in the distribution of the counterf actual, and hence unobservable, vari- 
ables T 9 , as they indicate the success or failure from applying the treatment regime g. In 
this section we show that, under Assumptions 12.31 and \2A\ the distribution of T 9 is identi- 
fiable from the distribution of the observed data (Lk, Ak, T) for each evaluable treatment 
regime g. As a consequence, given a random sample from the latter distribution, the 
distribution of T 9 is estimable, in principle. 

In fact, the following G- computation formula gives an explicit expression for P (T 9 > t), 
as well as several conditional survival functions, in terms of the distribution of the data 
(L K ,A K ,T). 

Theorem 3.1 (G-computation- formula). Suppose that Assumptions'^ (no unmeasured 



confounding) and \2.?ft ( consistency) hold, and that g is an evaluable treatment regime. Then 
for any t > 0, with p defined by r p < t < r p+1 , 

p (T g > t) = £•••££ 

Iq lp—\ lp 

,» 

X 

m=0 



p(t > t\L p = l p ,Ap = g (lp) ,T > r p 

I!- {^(^ > T m\L m -l = lm-li An-1 = <? (Jm-l) , T > T m _ij 
n=0 

P (^L m l m | L m —\ l m —\ , j4 m „x g (lm—i) •, T i> T m ^ ^ 



In the preceding theorem we interpret variables indexed by —1 as not present, and events 
concerning only such variables as being empty. For instance, the conditional probability 

i = 9 (Jm-i) ,T > r m ) is to be read as the probability 

P(Lq = Iq) when m = 0. 

All conditional probabilities on the right side concern observable variables. Hence the 
theorem gives an explicit description of the survival function of the counterfactual variable 
T 9 in terms of the distribution of the data (L^, Ar, T). 

It is instructive to evaluate the formula in the simple case that K = 1, when there 
exists only one treatment A applied in the single interval (0, Ti]. Then the G-computation 
formula yields, for t > 0, 

P(T 9 >t)=Y J P{T>t\L Q = l , A = g(l )) P(L = l ). 
lo 
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This shows that in general the distribution of the counterfactual variable T 9 differs from 
the distribution of T, which can be written in the form 

P{T >t) = Y J P{T> t\L = l ) P(L = l ). 

lo 

This difference is not too surprising, because the variable T 9 refers to the treatment regime 
g, whereas T relates to the observed outcomes under the actual treatments. Had all patients 
received treatment g, then the two distributions would coincide. More notable is the 
difference between the conditional distribution of T given A = ao and the distribution of 
T 9 for the fixed treatment regime g that assigns all patients to treatment ao, i.e. g(lo) = ao- 
These two survival distributions can be written 

P(T a °>t) = '£p(T>t\L = l ,A = ao)P(L = l ), 

lo 

P(T>t\A = ao) = ^P(T>t|L = / ,^o = ao)^(^o = /o|^o = ao)- 

h 

The conditional distribution of T given A = ao is estimable, in principle, by taking 
only those patients into account who happened to receive treatment ao- The outcome 
distribution of this subset of patients may however be different from the distribution of 
the counterfactual variable T a °, as a result of "selection bias". In the actual world some 
patients may be assigned other treatments than ao, where the assignment A Q may correlate 
with the covariate variable Lq. Therefore, the conditional and unconditional distributions 
of Lq given A may differ, and consequently so may the right hand sides of the display. It 
is the counterfactual survival function t h- > P(T a ° > t) that is the relevant one to judge 
the causal effect of treatment ao- Randomization of treatment over patients within fixed 
levels of the covariate would have made Lo and Ao independent, and the difference would 
disappear. The protocol of a controlled experiment may include such randomization, but 
in a observational study it cannot be taken for granted. The G-computation formula 
then shows, under some assumptions, how we can still compute the relevant outcome 
distributions from the observed data distribution. 

We can make further comparisons after deriving a similar representation for conditional 
probabilities involving the counterfactual variables. 

Theorem 3.2 (G-computation- formula). Under the assumptions of Theorera \2.J\ for any 
k 6 {0, 1, 2, ... , K} and any l k such that P {L~k = hi ^k-i = g (Jk-i) ,T > tjA > 0, for any 
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t > r k , and with p > k defined by r p < t < t p +i, 
P(T3 > t\L k = ~l k ,A k ^ = g (Z fc _!) ,T>r k 



E -EE 

lk+1 lp-1 lp 



X 



X 



p(t > t\L p = l p , A p = g (lp) , T > t p 

I!- {^(^ > T m\L m -l = Im-l, A m -i = g (J m -l) ,T > T m _; 
m=k+l 

P(^L m = / m |L m _i = 7 m _i, j4 m _i = g (I m _i) , T > Tn^j | 



(1) 



Again variables indexed by —1 should be read as not being present. Furthermore, a 
repeated summation of the form 5^ fe+1 • • • ^2 ip a k)P (l k , l k+ i, . . . , l p ) is considered to be the 

single term a kjk (J k ) if k = p, whereas the product n^+i i s ^° be read as 1 in this case. The 
summation may be restricted to terms whose conditioning events have positive probability. 

Again we may evaluate this formula in the simple case of a single treatment interval. 
Then the formula in the preceding theorem (with k = = p, K = 1) reduces to 

P{T' J > t\Lo = lo) = P{T > t\L = Iq, A = g(k))- 

The right side is precisely the conditional distribution of the actual survival time for a 
subject with covariate Iq following the treatment regime g. Intuitively, the conditional 
probabilities P(T > t\L = lo,A = g(lo)) are the correct ones for evaluating the quality of 
treatment g for a subject with covariate value l , and the equality in the preceding display 
is actually a direct consequence of the Assumptions 12.31 and 12.41 relating the counterfactual 
and factual survival times. (We may add Aq = g(lo) in the conditioning event on the left 
by Assumption 12.41 and next use Assumption 12.31 to see that T 9 may be replaced by T.) 

Henceforth, we shall denote the right side of by s- lk (t). For k = — 1 this reduces 
to the right side in Theorem 13. 1[ and we write it as s g (t), interpreting l_i as empty. Then 
Theorems I3.1H3.2I can be reformulated as saying that under Assumptions 12.31 (consistency) 
and 12.41 fno unmeasured confounding), for every evaluable treatment regime g, 

P{T°>t) = s g {t) 

and, for every k = 0, 1, . . . , K, 

P(r 9 > t\L k = l k ,A k _ 1 = g(l k _ 1 ) ,T> r fe ) = s- lk>g (t). 

These functions are survival functions of distributions that concentrate on (r k , oo). 

Inspection of the G-computation formula shows that sj k is a (complicated) function 
of the distribution of the data vector (Lk, Ak, T) and depends on this distribution only 
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through the conditional distributions of the covariates and the survival time given the past, 
given by 

and 

P (T > t\L m —x = l m —i, An-l = a m-l> ^ > r m-l) • (3) 

In particular, the functions s g j fc do not depend on conditional laws of the treatment vari- 
ables A m given the past. 

Proof of Theorems 13.11 and 13.21 We prove Theorems 13 1 1 1 and 13 . 21 by backward induc- 
tion on k, for fixed t (and hence also fixed p). Formula (JJJ) with k = — 1 can be read as the 
formula given by Theorem 13.11 so we restrict to proving (|TJ) . 
For k = p the left side of (JJJ) is equal to 

P (T 9 > t\L p = l p , A p _i = g (l p _ x ) ,T>r p ) 

= P(T 9 > t\L p = J p ,A p = g(J p ),T> Tp ) 

= P(T> t\L p = 1 P ,A P = g(l p ),T> t p ) , 

where in the first equality we can add A p = g p Up) in the conditioning event by Assump- 
tion |23 of no unmeasured confounding, and in the second equality we can replace the event 
T 9 > t by the event T > t, because of the Assumption 12.31 of consistency. 

The induction step is proved by similar arguments. Supposing that ((TJ) holds for k < p, 
we shall deduce that it also holds for k — 1. We have 

P (T 9 > t|L fc _i = ~l k -i,A k _ 2 = g (l k _ 2 ) , T > r fc _ x ) 

= P(T 9 > t\L k -! = Zfc-i,3fc_i = g (Ifc-i) , T > Tfc-i) 

= P(T 9 > r fc |L fc _i = 7 fc _i, A fc _! = p Qk-l) , T > T fe-i) 

The first equality follows by the assumption of no unmeasured confounding, while the 
second follows by conditioning on the event T 9 > T k , where we note that t > T k , because 
t > t p > T k . By the consistency assumption we can replace the event T 9 > r k by the 
event T > T k without changing the events or probabilities. Next we can rewrite the second 
probability as a sum by conditioning on the variable L k , to obtain that the preceding 
display is equal to 

> Tfc|Lfc_i = A k -i = g (l k -i) , T > Tfc_i) 



x P (T 9 > t\L k = J k , A k _ x = g (7 fc _!) , T > Tk ) 

xP (L k = l k \L k -i = I fc _i, A fc _i =g(Jk-i) ,T >T k ) . 

Finally we replace the probability involving the counterfactual variable T 9 by the right 
side of (0), which is permitted in view of the induction hypothesis. This yields the right 
side of ((TJ) for k — 1, and concludes the induction step. □ 
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4 Reparameterization 



To investigate the effect of a given treatment regime g on survival, it suffices to know 
the conditional distributions given in (J2J) and (JHJ). Given these distributions we can com- 
pute the counterfactual survival functions by using the G-computation formula, given by 
Theorem 13.11 

Because carrying out this computation may be a formidable task, we may perform the 
calculation by simulation methods, rather than by analytical calculation. Robins (1986, 
1987, 1988) provides a Monte Carlo algorithm, called the "Monte Carlo G-computation 
algorithm" , for evaluating the functions s g that satisfactorily resolves potential difficulties 
with the analytical computation. We refer the reader to these papers for further discussion. 

A difficulty is that the distributions in (j2J) and © will typically be unknown and must 
be estimated from the data. One possibility is to specify models for (j2j) and Q, for instance 
logistic or Cox models, and next estimate the unknown parameters from the data. The 
function s g can then be estimated using the Monte Carlo G-computation algorithm with 
model derived estimates. Robins (1986, 1987) provides several worked examples of this 
approach. 

This approach has a number of unattractive features. Estimation of the function s g 
according to the preceding scheme and without confidence intervals, may be feasible, but 
testing whether treatment affects the outcome is complicated. The models used to specify 
s g will usually be rough approximations, and the null hypothesis of no treatment effect will 
be a complex function of all parameters. Standard statistical software may not apply, and 
in large datasets the null hypothesis will usually be rejected, just because of model misspec- 
ification (cf. Robins (1986, 1987, 1988, 1989)). In this paper we take a different approach, 
based on a reparameterization of the joint distribution of the observations (Lk, Ak,T^ 
using structural nested failure time models (SNFTM). 

SNFTMs are models for the causal effect of skipping a "last" treatment dose given the 
past, thus reverting to the "baseline treatment". To make this precise, suppose that there 
is a certain baseline treatment regime, which we shall refer to as "no treatment". This 
could for instance be "zero medication", and consequently we shall let a zero in the sets 
Ak of treatment dosages refer to treatment under the baseline treatment regime. 

At any time point a doctor could switch a patient to the baseline regime, at least 
conceptually, and leave her there. Let (a^, 0) be an abbreviation for the treatment regime 
g — (do, ■ ■ ■ , Ofc, 0, . . . , 0), i.e. the mth coordinate function of g is given by 



Henceforth, we shall always assume that Assumptions 12.31 (consistency) and 12.41 (no un- 
measured confounding) are satisfied. Then, by Theorem l3.11 if the treatment regime (a^, 0) 
is evaluable, the function 



(by definition the right side of with g = (ak, 0)) is the conditional survival function of the 




a m for any value of the covariate vector l m if m < k 



t I — ^ St 
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counterfactual survival time 7A afe, °) given the treatment- and covariate history (l k ,a k -.\) 
up to time r k} and given that T^' 5 ) > 77. Define "shift-functions" 7 by 



(4) 



where the inverse s _1 is the quantile function of the corresponding survival function. 

The functions 7 map percentiles of the distribution of the random variable T( afc '°) 
those of the distribution of the random variable yl^- 1 -^ 

S lk,(a k -xfi) °^'lk,a k = S ~l k ,(a k ,0)- 



into 



(5) 



The functions 7 thus measure the effect of skipping the "last" treatment dose a& given 
the covariate and treatment history (J k7 a k _i). We assume that the survival functions are 
continuous and strictly decreasing, so that (J1J and (J3J) give equivalent definitions. 

If the "last treatment" a k has no effect, then the functions s 7 /_ „\ and s 7 /_ „\ are 

identical, and the function 77 _ is the identity function. More generally, the function jj - 
can be seen to measure the effect of the treatment a k given in [77., r k+ i) on (counterfactual) 
survival. This is illustrated in Figure ^ 




%,[a k ,0) 



■0) 



Figure 1: Illustration of the shift-function 7. In this picture the function s- lk ^ ^ lies to 
the left of the function s- lk ^ , indicating that skipping the treatment a k decreases survival 

for patients with covariate and treatment history (Z fc , a^-i)- In this case the function 77 
is below the identity. 



Conversely, if the shift function 7? _ is equal to the identity function, then the distri- 
bution of the counterfactual variables 7A 3fc 'v an d yC 3 *- 1 ^) coincide for patients with past 
covariate- and treatment history l k and a k -\. This suggests that, if 77- * s the identity 
function for all values of l k , a k and k, then treatment does not affect the outcome of inter- 
est: skipping the last treatment does not affect the outcome of interest, next skipping the 
second-last treatment does not affect the outcome of interest, etcetera. 
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For a rigorous proof of this conclusion it is necessary that sufficiently many treatment 
regimes are evaluable, because the functions s- lk (defined in terms of the distribution 
of the observable data by the right side of ((TJ) are equal to the counterfactual survival 
distributions only if the treatment regime g is evaluable. For instance, the treatment regime 
g = (5fc, 0) need not be evaluable for all ak and hence the distributions of the counterfactual 
variables T^ ak ^) and/or y( afe - 1>°) m ay not be identifiable from the observed data. To 
overcome this difficulty we assume that the baseline treatment regime is "admissible" . A 
treatment regime is called "admissible" if in every situation there is a positive probability 
for this regime to be implemented in the next step. As applied to the baseline regime 0, 
this property takes the form of the following assumption. 

Assumption 4.1 (admissible baseline treatment regime). For each k, each l k e C k and 
each Ofc-i G Ak-i, 

P (L k = Ik, A k -\ = Sfe-i, T > T k ) > =$> P (L k = I k , A fe _i = a fc _i, A k = 0, T > r k ) > 0. 

Under this assumption the shift functions Sfc are identifiable for all values of (k, lk, ak) 
with P (Lk = J k ,A k = elk, T > Tk) > 0, and fully characterize the potential effect of any 
treatment regime. This is the content of the following theorem, whose proof is deferred 
to Appendix IA1 (As shown in Lok (2001, Section 2.12), Assumption 14.11 can be avoided 
if one allows to be a so-called admissible baseline course of treatment, which may not 
only depend on past covariate- but also on past treatment history. Some admissible base- 
line course of treatment, which has a positive probability of occurring after any observed 
treatment- and covariate history, always exists.) 

Theorem 4.2 Under Assumptions \2.J\ (no unmeasured confounding), \2.<A (consistency) 
and \4-l\ (admissible baseline treatment regime), the distribution of T 9 is the same under 
all evaluable treatment regimes g if and only if the shift-function 7j fc Sfc is the identity for 
all (k,l k ,a k ) with P (L k = l k ,A k = a k ,T> r k ) > 0. 

It follows that the functions 7j fc Sfc characterize the null hypothesis of no treatment 
effect. Because they also possess an easy interpretation in terms of the effect of a "last 
blip" of treatment, it is attractive to model these functions rather than the set of conditional 
distributions in (J2J) and ([3*]). A structural nested failure time model is a parametrized family 
of functions used to model the functions jj - . Each of the model functions is an increasing 
function on [r k ,oo) (that can arise as a quantile-distribution function), with the identity 
function referring to the absence of the treatment effect. 

With the parameter denoted by ip = (ipi, tyi, ^3), one example of an SNFTM would be 

7/ ^ (t) = r k + (min {r k+1 , t} - r k ) e ^+<M^-i+<W* + {t _ Tfc+l) x 

'■ki a 'k 

If ijj = 0, then this function reduces to the identity function, indicating that the parameter 
value ip = corresponds to the absence of a treatment effect. For nonzero values of ijj the 
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1.0 1.5 2.0 2.5 3.0 



Figure 2: Examples of shift functions. The picture shows the identity function (dashed) 
and the function t i-> r fc +(min {r k+1 ,t} - r k ) 0.5 + (t - r k+1 ) l{t> Tfe+1 } for r k = 1 < r k+1 = 2, 
which corresponds to decreasing survival by skipping the treatment in the interval (r k , r k +i]. 

model corresponds to a "change of time scale" depending on present and past treatment 
(a k , afc_i) and present covariate (4). The variable L k might for instance be the univariate 
covariate CD4 lymphocyte count at r k , and the variable A k the AZT prescription. Then 
the given model allows for interaction between CD4 lymphocyte count and treatment, and 
could of course be extended with other factors. Figure 121 shows two typical functions 7 
following this model. 

5 Mimicking counterfactual outcomes 

In the next two sections we present two methods for estimating the parameter ip in a 
structural nested failure time model. Theorem 15. II below is basic for both methods. Con- 
sider the following transformation of the observation (Lk,Ak,T), using the "true" shift 
functions 7 (given by (JH)): 

T o = 7l ,a TLiA • ■ ■ 7l p(T) ,A p(T) ( T ) > ( 6 ) 

where p(T) = max {k : r k < T}. The application of the function to T annihilates 

the effect of the last treatment A p (T), and each further application of a shift function 
annihilates the effect of an earlier treatment. This explains the following theorem, which 
is proved in Appendix El 

Theorem 5.1 (mimicking counterfactual outcomes). The variable Tq defined in pos- 
sesses survival function s-q. Furthermore, for every k > 0, 

A k ALT^\L k ,A k ^,T > r k . (7) 
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The variable Tq is a (deterministic) function of the data vector (Lk, Ak,T) , through 
the (unknown) family of shift-functions 7. If the shift functions 7 would be known, then 
we would be able to "mimic" the survival time without treatment by calculating the trans- 
formation Tq. By the preceding theorem this variable is distributed according to s-q and 
hence under the conditions of Theorem 13. II possesses the same distribution as T 9 for g = 0, 
the null treatment. 

Equation ((7|) shows that the variable T$ also shares the "no unmeasured confounding" 
property ( Assumption 12. 4j) of counterfactual variables (in a slightly stronger form). 

6 Maximum likelihood estimation 

In this section we consider likelihood based inference for the parameter tp in a given 
SNFTM. Clearly this requires that we make the parameter i)j visible in the density of 
the observation (Lk, Ak,T) . We first show that this can be achieved using the transfor- 
mation Tq = Tq (T, Lk, Ak) defined in ©, which will depend on tp if we use a SNFTM 
for 7. 

Theorem 6.1 (the likelihood rewritten). Suppose that Assumption \4- 1\ (admissible base- 
line treatment regime) holds. Suppose moreover that (T, Lk, Ak) has a Lebesgue density, 
and that the function t 1— > sq^ ^ fc o)) W ^ s continuously differentiable in t, for all l k , a k 

with P (L k = l k , Ak = Uk, T > r k ) > 0, with strictly negative derivative except for at most 
finitely many points. Then the joint density of (T, can be rewritten as 

fT,L,A (^j l-> a ) 


= —tl (t, l p ,a p ) f T i {tl (t, l p ,a p )) P (L = l \I% = #) P (A = a \L = l ) 
P f - - — 

fc=0 

P (A k = a k \L k = l k , A k -i = a k -\, T > r k ) j , 

where r v < t < r p+ i and 

tl (t, l p , Op) = Tl ,a ° Thfr ° • ■ ■ ° %,a p (*) • 

Proof. Under the conditions of Theorem ldl[ 

(T,I,A) h- (TIL, A) = (tl(T),L,A) 

is a one-to-one mapping. Thus if £q were continuously differentiable everywhere, then the 
identity 

Q _ _ ____ 

fr,TA (*> Z ' a ) = (*> Z > a ) f^,L,A (*> a ) > a ) (8) 
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would be immediate from the change of variables formula. We show that (JBJ) holds under 
the conditions of Theorem 16. II too. Next the assertion of the theorem follows by repeated 
conditioning and using Theorem 15.11 

To prove (jBJ) in general, note that the probability space consists of countably many sets 
of the form [Lk = Ik, Ak = ax), so that by countable additivity of measures it suffices to 
prove (JBJ) on each of these sets that has probability greater than 0. On each of these sets, 
£q is one-to-one and continuously different iable except for at finitely many points: it is the 
composition of finitely many functions Ji Uk and under the assumptions of Theorem 16.11 



'fe>( a fc-i, 



exists and is continuous except for at most finitely many t. Thus, from the change of 
variables formula, equation (jSJ) is true on each set [L K = 1 K , A K = Uk), as we needed to 
show. □ 

Regarding the conditions of Theorem 16.11 we note that the baseline treatment regime 
may not be constant, whence the death rate under may change at the time points r m . 
However, it will often be reasonable to assume differentiability of the function ^ (t) 

on all intervals (r m ,r m+ i). 

For likelihood inference concerning the parameter ifi of an SNFTM, we shall generally 
drop the factors 

P (A k = a k \L k = l k , = (9) 

from the likelihood. All other terms involve if) through and we will need to specify 
models for these terms in order to proceed, typically involving additional parameters. 
Given such models we can estimate ip by the corresponding coordinate of the maximum 
likelihood estimator obtained by maximizing the likelihood over all parameters. Of course 
finding this maximizer may be a formidable task. 

Since the null hypothesis of no treatment effect is equivalent to the functions j- lk Sfc being 
equal to the identity function, by Theorem 14.21 this hypothesis can be fully expressed in 
the parameter -0. For instance, we could, by convention, construct our SNFTM in such a 
way that this null hypothesis is equivalent to H : ip = 0. Then we can obtain a likelihood- 
based test for the null hypothesis of no treatment effect using the Wald, score or likelihood 
ratio test for H : ip = 0. 



7 G— estimation 

The likelihood methods of the preceding section require the specification of models for the 
conditional laws of the covariates, among others, next to a specification of an SNFTM. In 
this section we present an alternative approach to testing and estimation of the parameter 
in a SNFTM, called G-estimation in Robins (1998). This approach is based on models for 
the conditional distributions of the treatment variables given in Q. It can be considered 
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a semiparametric approach, where the parametric component refers to the laws Q and 
all other laws appearing in Theorem 16 . II form the nonparametric, unspecified component. 
From a practical perspective modelling the distributions (JHJ) is more attractive than mod- 
elling the remaining laws in Theorem 16. 1| as it may be expected that doctors have clear 
ideas, at least qualitatively, about how they reach their decisions about treatment. 

The method of G-estimation is based on the conditional independence of the "blipped- 
up" variable defined in (jUJ) and the treatment variable A k given the variables L k and 
Ak-i, for each k, asserted by Theorem 15.11 Consider first testing the null hypothesis 
Hq '■ 7 = 7o for a given shift function 7 . Theorem 15.11 gives, under the null hypothesis, 
that, for each k, 

A k ALTj° \L k ,A k - U T > r k . (10) 

This is an assertion about the observed data vector [Lk, Ak,T) only. Any test for the 
validity of (fTUj) is therefore a test for the null hypothesis H : 7 = 70 . 
In order to operationalize this idea we adopt for each k a model 

Pe (Ak = a>k\Lk = lk, A k -i = a k -x,T > T&) 

for the prediction of treatment given the past, indexed by some parameter 9. Such a 
model tries to explain the treatment A k by the values of the covariates up to time r k and 
the preceding treatment history. Formula (jlUj) implies that, under the null hypothesis, 
inclusion of the variable Tj as an extra explanatory variable is useless for the prediction 
of A k , if past covariate- and treatment information L k and A k _\ are known. Thus given 
a term of the form a Tj in the prediction model with a a parameter, the true value of 
a must be equal to 0, because of (fTUj) . It follows that we can test the null hypothesis 
Ho : 7 = 70 by adding a term oTq° anyway, and next test the null hypothesis H : a = 
in the model indexed by the overall parameter (9, a). Depending on the chosen types of 
model such a test, for instance a Wald, score or the likelihood ratio test, can be performed 
by standard statistical software. 

This procedure is particularly simple for testing the null hypothesis of no treatment 
effect. In view of Theorem 14.21 this is equivalent to testing whether the function 7 is equal 
to the identity function, i.e. we take 70 in the preceding equal to the identity function. In 
this case the variable Tj is equal to T, and hence the G-estimation procedure reduces 
to testing the null hypothesis H : a = in a regression model that tries to explain the 
variable A k by the variables Lk, A k -i and aT. The null hypothesis of no treatment effect 
can be tested in this way without specifying a model for the shift function 7. 

For a specific example, suppose that the treatment variables A k are binary-valued. 
Then a logistic regression model is a standard choice for modelling the probabilities (JUJ). 
We might add the variable aTj to a logistic regression model to form the model 

P e , a (A k = a k \L k ,A k . u T> r k X) = - + ^(zX-rHwW) ' 

for given, known functions f k and g k , and unknown parameters 9 and a. A test for the 
null hypothesis H : a = can be carried out by standard software for logistic regression. 
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Given an SNFTM ip i— > 7^ for the shift functions 7, indexed by a parameter ip, we can 
extend the preceding testing methods to full inference on the parameter ip. First, we can 
obtain confidence regions for ip by inverting the tests for the null hypotheses Hq : 7 = 7^, 
in the usual way: the value ip belongs to the confidence region if the corresponding null 
hypothesis H is not rejected. 

A natural estimator of ip would be the center of a confidence set, or, alternatively, a 
value of ip for which Tq* contributes the least to the prediction model for treatment given 
the past. That is, the ip for which the fitted model for 

Pe,a (Ak = a k \L k ,A k ^,T > r k ,aT^) . (11) 

does not include the variable Tq* , i.e. a = 0. For each given value of the parameter ip of 
the SNFTM we may obtain estimators 9(ip) and a(ip) for the parameters 9 and a, based 
on the observations (V K ,A^ K ,T % ) on n persons. Then we define ip as the solution of the 
equation 

a(ip) = 0. 

If we use a logistic regression model, then the estimators 9 and a can be obtained with 
standard software, for each given value of ip. The estimator ip can next be found by a grid 
search method. Alternatively, we can implement a direct numerical method for estimating 
ip. 

The procedures just outlined may appear a bit unusual, in view of their indirect nature. 
However, in most cases they can also be interpreted in a standard way. For instance, the 
procedure for estimating a for given ip will often be equivalent to solving a = a (ip) from 
an estimating equation of the type 

n 
i=l 

Then ip satisfying a(ip) = will satisfy the estimating equation 

n 

^ ^X,:r)=o. 
i=i 

Because a(ipo) = for the true value ipo of ip, the true value of ip is a solution to the 
equation 

Ehoj (L K ,A K ,T) =0. 

In other words, ip will be the solution of an unbiased estimating equation, whence the 
(asymptotic) properties of ip can be ascertained with the usual theory for M-estimators 
(e.g. Van der Vaart (1998)). For instance, we may expect the sequence y/n[ip — ip) to be 
asymptotically (as n — > 00) normal with mean zero and variance 

Eh l,'4> ( L k, A k , T) 
(^Eh 0ti; (L K ,A K ,T)) 2 

Lok (1991) has studied the validity of these results, and has thus justified the preceding 
procedures. 
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8 Summary and extensions 



We have shown that the AZT treatment regime-specific, counterfactual AIDS-free survival 
curves P (T 9 > t) are identified for all evaluable treatment regimes g if our maintained 
assumption of no unmeasured confounding, Assumption 12.41 is me t- This assumption 
will hold if the investigator has succeeded in recording in l k data on all covariates that, 
conditional on past AZT history a k -i, predict both the AZT dosage rate a k in (rfc,r/. + i] 
and the random variables T 9 representing time to AIDS had, contrary to fact, all subjects 
followed an AZT treatment history consistent with regime g. 

Further, we have shown that, under the assumption of no unmeasured confounding, 
Assumption 12.41 the shift functions 7 of an SNFTM are the identity function if and only 
if the G-null hypothesis of no causal effect of AZT on time to AIDS is true. We have 
expressed the likelihood of the observable random variables (T, Lk, Ak) in terms of the 
transformed random variables (Tq , Lk, Ak) ■ We then developed parametric likelihood 
based tests of the hypothesis 7 = id by specifying fully parametric models for the joint 
distribution of (T, Lk, Ak) in terms of the transformed random variables (Tq , Lk, Ak)- 

Even in the absence of censoring or missing data, a major limitation of the fully para- 
metric likelihood-based tests of the null hypothesis 7 = id from Section El is that mis- 
specification of the parametric models for the distribution of L k given L k _i, A k _i and Tq, 
or for the distribution of T°, can cause the true a-level of the test to deviate from the 
nominal ct-level. This limitation raised the question of whether it is possible to construct 
a-level tests of the null hypothesis 7 = id and of more general hypotheses concerning 
7, which are asymptotically distribution-free. A closely related question is whether there 
exist n 1//2 -consistent asymptotically normal estimators of the parameter ip of a correctly 
specified structural nested failure time model if the joint distribution of the observables 
{Lk, A k ,T) is otherwise unspecified, i.e. if the distribution of L k given L k _i, A k _i and 
Tq and the distribution of the variable T° are left completely unspecified. In Section [7| we 
showed that one only needs to specify a parametric model for the shift function 7, which 
models the causal effect of one treatment dosage given the past, and a parametric model 
for the distribution of actual treatment dosage given past treatment- and covariate history. 
Doctors will usually have clear ideas about this latter distribution of treatment decisions. 
Moreover, the doctors' interest will often be in the causal effect of one treatment dosage 
given the past. 

If the null hypothesis of no treatment effect has been rejected and the parameter ip of the 
shift function 7 has been estimated, one might wish to estimate the survival distribution 
t 1 — > P(T 9 > t) of the outcome under specific treatment regimes g in a way consistent 
with the estimator ip. This can be done by estimating the distribution of T° (e.g. by the 
empirical distribution of Tq ) and the empirical distribution of L k given L k -\, A k _i and 
Tq (k = 0, . . . , K) for histories L k -\, A k _i consistent with g. An approximate sample Tf 
{i — 1, 2, . . .) from the distribution of T 9 could then be generated by using these estimated 
distributions: first draw Tq from the distribution of T°, then draw L' from the distribution 
of Lq given Tq — Tq, then put A' Q — g(L f ), then draw L' x from the distribution of L\ given 
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Tq = Tq, A = A' Q and L = L' Q , etcetera. Finally put 

f9 = 4 A' _1 °--- O ^A' _1 ( T 0)- 

This variable will be generated from the desired distribution. 

Extensions of the results of this paper that allow for censoring and missing data are 
discussed in Robins (1988, 1992, 1993, 1998), and Robins et al (1992). The extension of 
G-tests and estimators to continuous L k and A k are discussed in Robins (1992, 1993), 
Robins et al. (1992), and Gill and Robins (2001). Robins (1998) and Lok (2001) show 
that the results in this paper can be extended to allow for jumps in the treatment- and 
covariate processes in continuous time. 



A Alternative formulation of the null hypothesis 

In this appendix we prove Theorem 14.21 through two lemmas. The first lemma shows that 
if all functions 7 are equal to the identity function, then all survival curves P (T 9 > t) for 
evaluable treatment regimes are the same. The second lemma shows the reverse. 

Lemma A.l Suppose that Assumptions \2.J\ (no unmeasured confounding), \2.c\ (consis- 
tency) and \4-l\ ( admissible baseline treatment regime) hold. Ifj- lk - k is the identity function 
for all k, l k E C k and a k G A k with P [L k = l k ,A k = a kl T> r k ) > 0, then all survival 
curves P (T 9 > t) for evaluable treatment regimes g are the same. 



Proof. We show that for all evaluable treatment regimes g and all l k with 
P (L fc = J k ,A k = g (l k ) , T > r fc ) > 0, the conditional distributions of the counterfactual 

variables T 9 and TV^-H**- 1 )' 5 ) given Z k = = g (Jk-i) ,T > r k are the same, i.e., 

for t > Tfc, 

%A t ) = \^(7 k - 1 )4 t) - (12) 
For k = —1 this should be read as s g (t) = %(t), which implies Lemma f A. 11 

We prove (|12j) by backward induction on k, for t fixed. With t p the last clinic visit 
time strictly before t, we start with k = p and end with k = 0. The statement for k — — 1 
follows from the statement for k = by summation over Iq. 

Basis: For k = p, by the definition of s as the right side of 

XJt) =P{T> t\L p = l p ,A p = g p (l p ) ,T > r p ) = %,( 5p (z p ) ,())(*), 

by another application of the definition of s. The right side is equal to s- lp ^_ /j \ ^(t) 

by the assumption that the function Ti p s p with a p = g p (J p ), is the identity function is the 
identity. 
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Induction step: we suppose that (|T2*j) is true for k > 1 and establish ()12|) for k — 1. By 
straightforward algebra using the definition of sj , 

s T k _ u g(t) = P{T>T k \L k _ 1 = l k _ 1 ,A k _ l =g(l k _ l ) : T>T k - 1 ) 

P (L k = l k \L k -i = Ifc-i, = # , T > r fe ) (£) . 

Here we can replace sj k using the induction hypothesis, giving that the preceding display 
is equal to 

P (T > r k \L k ^i = A fc _i = # (7 fc _i) , T > Tfc_i) 

^ P (L fc = i fc |L fc _i = Z fc _i, A fc _! = # ,T >r k ) sj^g^^yo) (t) 

Ik 

= S I fc -i,(9 fc _ 2 (I fe - 2 ),0) (*) ' 

where we use the definition of s in the first equality, and the assumption that Ti k _ 1 a k > 
for afc_i = </k_i(7fc-i), is the identity function in the second. □ 



Lemma A. 2 Suppose that Assumptions 00 T no unmeasured confounding), \2.tA (consis- 
tency) and \4 . 1\ ( admissible baseline treatment regime) hold. If the survival curves P (T 9 > t) 
are the same for all evaluable treatment regimes g, then the shift function j- { - is the iden- 
tity for all k, l k G C k and a k G A k with P [L k = J k ,A k = a k ,T>r k ) > 0. 



Proof. Let fixed l k , a k with P (L^ = l k ,A k = a k ,T > r k ) > be given. To prove that 
7^ fe Sfc is the identity we need to show that, for all t > r k , 

%,(a h ,o)( t )=%,(a^ 1 ,o)( t )- (13) 

Define a treatment regime g 1 by the coordinate functions j^liJ = a m if l m is the initial 

part of Ifc, and by g^lm) = otherwise. Define a second treatment regime g 2 by and 
g 2 = (c/ 1 fc _ 1 , 0) . Because of Assumption 14.11 and because P [L k = J k , A k = a k ,T> r k ) > 
0, the treatment regimes g l and g 2 are evaluable. Thus, by assumption, we have that 
P (T 91 > t) = P (T 92 > t), and these probabilities are given by the G-computation formula, 
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given in Theorem 13.11 For the first regime this formula can be written in the form 
P (T gi > t) 

k 

= y] • • • yi n { p ( T > r m\ L m-i = r m _i,A m _! = ^(L-i^t > r m _i) 

k h m=0 

P{L m = l m \L m —x = l m -i, A m -i = g 1 (l m — \) , T > r m ) (t) 

fc 

n > T m \L m _i = 7 m _i, A m _! = (7 1 (I m _l) , T > T m _i) 



+ 



m=0 



P(^L'm ^m\L'm—l ^m— li A m — 1 fi' {Jm— l) j ^ 



A similar expression holds for the treatment regime g 2 . Because the regimes g 1 and g 2 are 
constructed to be the same up to time T k -x, only the second terms of the summs differs be- 
tween these two expressions. Even there, the product preceding sj k i (£) and sj k g2 (t) is the 
same for g l and g 2 . Moreover, this factor is strictly positive, since P (L k = l k , A k = a k , T > r k 
by assumption. The equality of P (T 91 > t) and P (T 92 > t) therefore implies the equality 
of Sj and sj k g z(t). By construction of g l and g 2 , equation (JT3j) and hence Lemma lA. 21 
follow. □ 



B Mimicking counterfactual outcomes 

For t > define p(t) by r p (t) < £ < TpM+i, i.e. r p ( t ) is the last clinic visit time strictly before 
£. For k > with < p(T) we define a random variable by 

T k =lL k ,A k °---°lL p (T),A pm (T). 

For k > p{T) we interprete the (empty) composition of transformations on the right as the 
identity and define = T. 

In this appendix we prove the following theorem, which generalizes the first part of The- 
orem l5.ll This theorem implies the second part, since Tq is a function of A k -i, TjJ) . 

Theorem B.l For t > r k and every lk, a k with P [L k = l k , A k = a k ,T > Tk) > 0, 

P (T fc 7 > t\L k = lk,A k = a k ,T > r k ) = P(T k y >t\L k = l k ,A k -i = a k -i,T>T k ) 

= s ^,(s fc -i,o)( t )- 

Proof. We use backward induction on k, starting with k = K and ending with k = 0. 
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For k = K, 

P{Tl > t\L K = l K ,A K = a K ,T> t k ) = P{li K ,a K ( T ) > A^k = l K ,A K = a K ,T> r K ) 

= P(T> 1 -~ 1 (t)\L K =1 k ,Ak = a K ,T > r K ) 

= S l K ,(sK,0)(^lK,a K ^) 

Here the first equality is immediate from the definition of T^, the second follows by the 
strict monotonicity of the functions 7, the third by definition of s and the last by definition 
of 7. 

Induction step: we show that if the theorem is true for k + 1, then it is also true for k. 
Just as for k = K, 

P{T2 > t\L k = J k ,A k = a k ,T> r k ) = P(T^ +1 > \^ k (t)\L k = h,A k = a k ,T> r k ). 

Now we distinguish two possibilities: 7 7 _1 _ (t) < r k+ i and 7 7 ~ 1 _ (t) > T k+ i. In the first case, 
the right side of the preceding display is equal to 

P{T> r; 1 (t)\L k = l k ,A k = a k ,T> n) 

= \^ k ,o) (\^ k (t)) 

= S I fc ,(a fc _ 1 ,0)( t )' 

where the first equality holds because for s E (r k , T k+ ±] we have that > s} = {T > s} 

by the construction of T^ +1 , and the last equality holds by the definition of 7. In the second 
possibility, i.e. if 7 7 _1 (t) > r k+1 , 

*fc i a k 

P(T2 +1 > \^ k (t)\L k = ~k,A k = a k ,T> Tk ) 
= P(T k +i > r k+i\L k = J k ,A k = a k ,T> Tk) 

P ( T k + i > \^ k (t)\Lk = lk,Ak = a k ,T> Tk ,T^ +1 > Tk+ i) 
= P (T > T k+ i\L k = l k ,A k = a k ,T > T k ) 

P (L k= i = l k+ i\L k = 4, A k = a k ,T> T k +i) 

P{T2+i > \^ k (t)\L k+1 = l k+1 ,A k = a k ,T> r fe+1 )} 
= P (T > r k+ i\L k = l k , A k = a k ,T > T k ) 

{P {L k =i = l k +i\L k = l k ,A k = a k ,T> T k+1 ) % +1 ,(a k ,o)i\^ k (t))} 

= P (T > T k+1 \L k = ~l k ,A k = a k ,T > T k ) 

^2{P (L k =i = k+i\L k = ~l k ,A k = a k ,T > T fe+ i) s- lk+i (- k _ u ^ (t) } 
h+i 

= S I fc ,(a fe _ 1 ,0)( t )' 
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where in the first step we condition on T^ +1 > Tk+i, in the second we use that {TjJ +1 > r^+i} = 
{T > and we condition on L^+i, the fourth is the induction step, the fifth follows 

from the definition of 7 and the last from the definition of Sj 1- „\. □ 

Acknowledgement. This paper is based on an earlier manuscript by the first author. 
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